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Abstract 

This paper presents the asymptotic distributions of a general likelihood-based test statis- 
tic, derived using results of Wilks and Wald. The general form of the test statistic incor- 
porates the test statistics and associated asymptotic formulae previously derived by Cowan, 
Cranmer, Gross and Vitells, which are seen to be special cases of the likelihood-based test 
statistic described here. 

1 Introduction 

This paper defines likelihood based test statistics for hypothesis tests of a particular parameter of interest, 
where the parameter is believed to be physically (or otherwise) constrained to lie within a certain range. 
Defining both an upper and lower boundary for the parameter of interest will be referred to as a double- 
bounded test statistic, and (in the interest in denning a convention) will be notated by a letter with a tilde 
and a bar above it, e.g. t M (the /u labels the value of the parameter of interest that is to be tested with this 
particular instance of the test statistic). In the special case where the upper boundary is moved to oo, the 
resulting test statistic is referred to as lower-(single)bounded, notated by a letter with just a tilde above it 
(e.g. tfj). For the case where the lower boundary is moved to -oo, the notation will be used - this is an 
upper-(single)bounded test statistic. 

Likelihood-based test statistics can also be either two-sided or one-sided. This nomenclature refers 
to whether data resulting in a maximum likelihood estimator for the parameter of interest (p) that is not 
equal to the value being tested (//) should be considered as increasingly incompatible with the hypothesis 
for all values of ju (a two-sided test statistic), or instead if p. in only one-direction relative to /u should 
be considered increasingly incompatible (a one-sided test statistic). The convention that will be used to 
distinguish these two cases is that two-sided test statistics use the letter t, e.g. f M , whereas one-sided test 
statistics will use the letter q, e.g. q M . 

It is possible to define any customized region of ju values that are considered fully compatible with 
a given value of the parameter of interest 11. This type of test statistic will be referred to as a custom- 
sided test statistic, and the convention will be to use the letter k for such statistics. The definition of 
the custom-sided test statistic should therefore identify what regions of fi values are compatible with the 
given value of /u being tested (and therefore, where the test statistic should become 0, corresponding 
to maximal compatibility for likelihood ratio based test statistics). These regions of compatibility will 
be denoted by ni^ifi), a function that is valued 1 for fi considered compatible with fi, and otherwise. 
Therefore a two-sided test statistic can be expressed as a custom-sided test statistic with 

m„(fi) - V/) (1) 

A one-sided test statistic is a custom-sided test statistic with 

(O fl <li 

m »W = 1 , . ( 2 ) 
1 n>fi. 
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Asymptotic distributions for lower bound (at p. - 0) one-sided, two-sided and double-bounded two- 
sided likelihood-based test statistics have been previously derived by Cowan, Cranmer, Gross and Vitells 
(IH and_(2l). This paper presents the asymptotic distribution of a more general likelihood-based test 
statistic kfi, a double-bounded custom-sided test statistic. The distribution is derived in section |4] using 
approximate methods based on results due to Wilks ||3] and Wald 0. Section |2] summarises the test 
statistic and the resulting asymptotic cumulative probability distribution and probability density function, 
including how previously derived asymptotic distributions can be recovered as special cases of the test 
statistic. Section [3] presents a Monte Carlo study for a hypothetical analysis, designed to test the validity 
of the asymptotic formula that is derived in section |2] 

2 Distribution of 

The double-bounded custom-sided test statistic k M is defined as 



kji — 



-2 In 



UmMm)) 



_ 91 Unfi(fi)) 
me) 



-2 In 



m M (fi) = 1, 
m M (Jji) = 0,p <p L , 

m^ip) = 0,p L <p <p H , 
m^ip) = 0,p > p H , 



L(fi H ,8(p H )) 

where the compatibility function m M (fi) is defined as: 

1 /i is considered compatible with p, 



otherwise. 



(3) 



(4) 



The asymptotic cumulative distribution of k M , under the Wald approximation (where p follows a Gaussian 
distribution with a mean p' and standard deviation cr) is found to be (see section |4]): 
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where the quantities k^, k" A y , Al, ctl, Ah, <th are defined as: 
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the masked Gaussian integral ® m (x) is defined as: 
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J-oo 



exp 



x 
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m M (xcr + [i')dx, 



and H(x) is the discrete Heaviside function: 



(6) 
(7) 
(8) 
(9) 
(10) 

(11) 
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H(x) 



|0 x<0, 
1 1 x > 0. 



(14) 



The pdf is given by: 



f(k M \ii') = O m (oo)<J(L) 



where 
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2.1 Special Cases of the Test Statistic 

The likelihood-based test statistics that were originally described in [1] and can be considered as 
special cases of the general double-bounded custom-sided test statistic. 
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2.1.1 Distribution of t„ 



The unbounded two-sided test statistic t u is defined by: 



-co, 

oo, 

OV/). 



(17) 
(18) 
(19) 



It has the following properties: 



O m (oo) 
n M (x) 



CO 
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Therefore the pdf of t M is given by: 



1 1 1 
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(24) 



2.1.2 Distribution of 

The lower bounded two-sided test statistic t M is defined by: 

= 0, 
fI H - oo 
m^OEi) = OV/i, 

where //^ has been chosen as for consistency with (T). It has the following properties: 
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Therefore the pdf of is given by: 
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2.1.3 Distribution of q M 

The unbounded one-sided test statistic q u is defined by: 



Ml : 
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It has the following properties: 



<t m (oo) = O 



M -ji 

cr 



One sees that: 



(Ay + V^Do" + j"' > {\)cr + n'=n. 
(A y - V^Do- + // < (A y )o- + fi' =fi. 



Therefore: 



n ( A y + Vty) = 



leading to the pdf of ^ being given by: 
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2.1.4 Distribution of ^ 

The lower bounded one-sided test statistic q M is defined by: 



It has the following properties: 
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One sees that: 



Therefore: 



(Ay+^)(T+f/ > (Ay)CT + fl' = fl. 

(Ay-^dcr + j/ < (Ay)CT + fl' = fl. 



rip (A y + yfa) = 
n M (A y - V^) = 1 



Similarly, for eft < q M one finds: 



A L - cjji 



Therefore: 
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leading to the pdf of ^ being given by: 
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q^ < fi 2 /cr 2 , 
q u > fu 2 /<r 2 . 
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2.1.5 Distribution of 

The double-bounded two-sided test statistic is defined by: 

m M (fi) = 0. 

Thus all terms involving m M are zero, and the pdf becomes: 
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(59) 



where the quantities E', A^, 07,, A//, cr# and Ay follow those given in equations l6l to [T2l This expres- 
sion is compatible with the one presented in [2] up to a couple of sign differences present in the final 
term, which this author believes are nothing more than unintentional typographical errors present in the 
result of 0. 
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3 Toy MC Study 



The accuracy of the asymptotic formula for the distribution was tested with the following example. 

One imagines a hypothetical experiment where a count of a number of events n is made, where it is 
assumed that this count is modelled by a Poisson distribution with expectation given by E[n] = fiLe + b. 
Here we take b to be some background contribution, which will be constrained a Gaussian term. L could 
represent a luminosity measurement, and e may represent some acceptance for a hypothesised signal, 
which will have a cross-section given by /i. L and e will also be constrained by Gaussian terms, such that 
if one were to observe n events, the likelihood function for this observed data would be: 

LQjl, b, L, e) = P(n\fiLe + b)G(b \b, cr b )G(L \L, cr L )G(e \e, cr £ ), (60) 

where bo, Lq and eo represent auxiliary measurements that constrain the nuisance parameters, and 07,, cr L 
and o~ e are the corresponding uncertainties on those auxiliary measurements. P(n\x) is the Poisson dis- 
tribution with mean x, evaluated at n, and G(x\a, b) is a Gaussian distribution with mean a and standard 
deviation b, evaluated at x. 

Suppose, for the puiposes of demonstration, that fj. was believed to lie somewhere between -5 and 
20, and for a null hypothesis was believed to be equal to 4. One seeks to assess deviations from this 
null hypothesis and set confidence limits on the parameter of interest /i. Therefore one defines the test 
statistic k M with the following parameters: 



mMl) 



-5, 
20 

1 n<4,ji<n, 

1 n>4,ji>n, 

otherwise 



(61) 
(62) 

(63) 



The compatibility function is defined such that when testing a value of fi less than the null value of 
4, data corresponding to values of p. < fj. are considered fully compatible with the hypothesis of But 
for testing values of /u > 4, only data corresponding to values of ju > /u are considered compatible. 

For the example, values for the global observables (bo,Lo and eo) and their uncertainties were some- 
what arbitrarily chosen such that /u = -5 corresponded to (fiLe + b) « 24 and /j = 20 corresponded 
to {jdLe + b) w 113, therefore in all cases of possible true values of the parameter of interest /u' the ex- 
pected number of events is positive and not close to 0. For completeness, the nuisance parameter values 
used for generating pseudo-data for the global observables (bo, Lq and eo) were taken as the conditional 
maximum likelihood estimator values, under the respective fx' hypothesis, using a data sample of 61 
observed events. The standard deviation of parameter of interest estimator (<x) is estimated from the 
Asimov dataset, as discussed in ITD . 

Figure [TJshows distributions of the test statistic k M for a selection of values of fi, under the hypothesis 
of four different true values of the parameter of interest fi'. The distributions were generated using a 
Monte Carlo technique and are shown as dashed lines. The distributions obtained from the asymptotic 
formula given in equation [3U are shown as solid curves. 
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(a) yt = 



(b) n> = 4 




(c) ^ = 8 (d) // = 18 



Figure 1: Distributions of the test statistic with fi at values of -2,5,10 and 18, for different true values 
of the parameter of interest y! '. 



4 Derivation of Distribution 

One seeks the sampling distribution for the test statistic defined in equation [3] The method used is to 
determine the cumulative probability distribution under the Wald approximation, and then differentiate 
to arrive at the pdf. 

The Wald approximation for a single parameter of interest states that: 

.^SkIw.ol^, (64) 

which is valid in the case of large data sample sizes. Under the assumption of a particular true but 
unknown value of the signal strength parameter, fi', the maximum likelihood estimator of the signal 
strength, fl follows a Gaussian distribution with mean // and standard deviation cr. For later convenience, 
this can be expressed as 

fi = crx + /u', (65) 
where x is a normally distributed random variable. 
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Using the approximation [64] one is able to write: 



Co 

a- 2 

cr 2 



m^ifi) = 1, 
m^ip) = 0,fi< fi L , 
m^ifi) = 0,iu L <p.< fi H , 
ntnifi) -0,fi > hh, 



(66) 



where the expressions in the second and last cases come from expressing the associated likelihood ratio 
as the sum of two likelihood ratios, before apply the approximation [64] for example: 



2] n Uixfijn)) _ 
Uji L $(p. L )) 



9] UjxMjXA 
(m-m) 2 _ G"l-/"0 2 

n-2 rrl 1 



9]n U ML fi(p L )) 
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(67) 



The possible values of range from to oo, but one finds that each region of p produces values 
of k^ in certain ranges. For the case p < ^i, because (ju - /jl) > (one assumes that you would not 
wish to construct this test statistic for a value of /u outside of the physically allowed region), we find that 
this range of p produce k^ greater than a minimum value. A similar argument applies to p > fin, where 

|V -M 2 l -2(V-Ml)ml 



^ — *) ft 2 ~i? L -2(ii-n H )ii H 



= (^) = ~tf m lt (fi) = 0,fi<f lL , 
= (^m) 2 = lu m M (fi) = 0,fi>iJ H , 



(68) 



where i~ and kjf are introduced as convenient labels for the values defining the boundaries of each of 
these regions. The order of these limits defines two cases that must be treated separately. We start with 
the case where \/j - < \n - ie. where one has formed the test statistic for a value of that is 
closer to the lower bound than it is to the upper bound. 

4.1 Case where Vr. < kff 

For the cases where m^(fi) = 0, three regions of possible k M are defined with the associated possible value 
of p: 

/u L <p< Hh, 
p < hh, 
p. < m,p > fl H , 



< ^ < £| 

^JA —%L < kjj 

kjt — kfi 



(69) 



One can construct the cumulative probability distribution for each region of k M values. We start with 
lowest value region. 



4.1.1 < % < If; 

The contributions to this region are the cases where m M (p) = 1 and {m^ip) - 0,fiL < P < Hh)- One 
defines the quantity y: 

H-p 



y 



(T 



(70) 



such that kn = y 2 (this is true in this region because the other two expressions for k M as given in equation 
[66] correspond to values of p that do not contribute to this range of k M values). Therefore to obtain a value 
of the test statistic less than k M corresponds to either m M (p) = 1, or m M (p) = and a value of y in the 
range: 

-Jlfi<y< J\[. (71) 
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Using the distribution of ju (equation [65]), we can deduce that y is also a Gaussian distributed quantity, 
with mean and standard deviation given by: 



Ay = ,CTy = 1. 

cr 

The cumulative distribution for k M in this region is given by: 

Fifyfi') = P(m M (fi) = 1) + P(- <y< yfai,fJ.L < A < Ph, m^ifi) = 0). 
The constraint terms can be re-expressed in terms of the random variable y: 

li l <h<hh -» <y< , 

cr cr 
m M (fa) = -> m^Ou - cn/) = 0. 



(72) 

(73) 

(74) 
(75) 



The first of these constraints has no effect in this region of k M (we are only considering k^ < k^ < W, 
therefore this constraint is always satisfied here). So the cumulative distribution can be expressed as: 



(76) 



Fiik^') = P{m^{p) = 1) + P(- A/fcp < y < -yJk^m^ijj, - cry) - 0). 



The second term corresponds to the probability for y to lie between - and -Jk^,, but where it must not 
result in m^Qi - cry) = 1. This is equivalent to the probability of - yjk^, < y < -\Jk\t, less the probability 
of both - Jkp < y < Jkfj and m^iji - cry) = 1 being satisfied. Hence 



F { CW) = p ( m ^(fi) = l) + P(-yj%<y< ^) - P(- <<J% < y < m> - cry) = 1). (77) 

Due to the random variables ju and y being Gaussian distributed, the probabilities can be expressed as 
integrals: 
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(78) 



Equation ITOl and [65] are used to perform a change of variables, giving: 
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m^{xcr + y!)c\x 



+3>(V^-A ;/ )-l+0(^ + A !/ ) 
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where the follow useful relationship was also used: 

O(jc) - 1 - <D(-x). 



mJxcr + Li')dx, 



(79) 



(80) 
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Using the masked Gaussian integral defined in equation [13] the cumulative distribution is finally ex- 
pressed as: 

FiC^I//) = O m (oo) + (D m (A y - yjl^ - ® m (Ay + yjT M ) 

+0(^-A y )- l + 0(^ + A,). (81) 

4.1.2 %<I„<^ 

The next region of k M values has contributions from the ft < /ul and fir, < ft < I^h regions of possible ft 
values (for the cases where m^ifi) = 0). 

Define the random variable zl, which is the value of when jj < /jl. 

zl = o 

H 1 - n\ - 2(p - fi L )fi' 2(ju - m) 
= 5 x 

= A L -cr L x, (82) 
i.e. zl is Gaussian distribution with mean and standard deviation given by: 

.2 ,,2 



fJ. z -Hl-2Qi- 2(ji - m) 

5 , 0"L - 

cr z cr 



Al 5 , o-l = - " ■ (83) 



For the ul < p. < Hh case, the constraint in equation [TJ] now comes into effect at the upper limit of 
(/y - //l)/o" = -y^- The cumulative distribution in this region is given by: 

+0( > /^-A y )-l + (D(^ + A y ) 

+P(m fl (fi) = 0)P(fi < n L \m^{fi) = 0)P(z L < %\m^) = 0,fi < /i L ). (84) 
Re-expressing the constraints in the last line in terms of zl, one has: 

fi<VL -» z L >V^, (85) 
( Zlo 2 + /J 2 , - n 2 \ 

m \ 20,. -rt j°° <86> 

The distribution therefore can be expressed as: 

+(D(^-A !/ )-l + (D( > /^ + A y ) 

+P(k^<z L <h,m II (jX) = 0), (87) 
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which can be written in integral form as: 



+d>(yli$-A y )-l+®Uk ll + A y ) 





-o 




{ ) 




1 ^ J 



f 



v 1 



exp 



(Z£ - Al) 2 



Z L CT 2 +/| 2 -^ 

m„ I — — : — I dzi- 



2(ju L - fi) 



A change of variables using the definition of zl given in equation 1821 gives: 
FiCk^') = O m (oo) + O m ((p L ~ n')l(r) - O m (A y + 



+0(V^-A,)-1 + (D(V^ + A ;/ ) 
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By noting that 



0"L 



0" - Ml) 2 ~ [Q» + jUzJQ" ~ Afzj ~ 2Qu - hl)h'] 
2(ju - iu L )cr 



Q*'-»l) 
cr 

the cumulative distribution is simplified to 

Flfytf) = <t>m(°°) - ^(Ay + - 1 + 0( + Ay) 

'A L -I^ 



0"L 
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(89) 



(90) 



(91) 
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4.1.3 k M > k% 

The two regions of possible fi values contributing to this region of k^ are the fi < /jl and fi > 
Introduce the random variable zh defined as: 



Zh 



cr* 



H - n H - 2Qi - fi H )(i' 2(hh~h) 
= + x 



cr- 



CT 



= A H + o- H x, 



(93) 
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i.e. z.h is Gaussian distributed mean and standard deviation given by: 



A H = 



-,cth 



(T 



(94) 



Both limits in equation [74] apply to contributions coming from (m M (fi) = 0,/ul < ft < Hh) (i-6- any time 
this case occurs will result in a test statistic value smaller than k^), and the constraints on zh are similar 
to the constraints on zl given in the last section: 



F 3 (L|//) = <D m (oo) - ® m (A y + JB) - 1 + (D( JB + A y ) + <D 



+P(E <zh <k„,m„(jl) = 0). 



(95) 



Note that: 
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Using this and changing variables as before, one finds: 

F 3 (W) = O ot (oo) - O m (A, + fijj) - 1 + 0( Jt* + Ay) + <D | 
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(97) 



4.2 Case where £f > kff 

In this case, the test statistic is being formed for a value of/i closer to the upper physical bound (fin) than 
to the lower physical bound (jUl). The possible values of k^ are again divided into three regions, and the 
cumulative distribution determined for each. 

4.2.1 % < Ijf 

In this region, only the fii < ft < (*h values contribute, so again the cumulative distribution is found to 
be Fiik^lfi'), given in equation [8T1 
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4.2.2 £*<^<^ 

The lower limit of the constraint in equation [TJJis reached. The cumulative distribution is given by: 
F 2 (W) = O m (oo) + O m (Ay O m (A y + 



+0(^-A y )-l + d)(^ + A y ) 
+P(rf <z H <l M ,m M (P) = 0) 



= O m (oo) + O m (A y - yjkft) - ®m(\ + yjkf?) 
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+0 — - €>, 
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O-H 



(98) 



4.2.3 k M > kf; 

Both limits of equation 1741 are reached, and the cumulative distribution becomes identical to Fi,(k^\/u') 
given in equation 1971 

4.3 Summary of distribution 

The cumulative distribution in full is given by: 



O m (oo) + <D( Jk u - Ay) + 0( Jkp + Ay) - 1 + <D m (Ay - JkJ - O m (A y + y^) k^<kf;< ^, 
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<D m (°°) + 3>( JL - A„) + O ( ^ ) - 1 + <5 W (A„ - JL) - O, 



0"H 



i + o„ 



(99) 
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The pdf is found by differentiation to be: 
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O m (oo)<5(^) + |-^-j=exp 



1 (Va l 



^£-Ag 



■5 1 V^- A 5/ 



Aj,- 



■j I V^ + A y 



O m (oo)5(^) + i-^=-J=exp 



exp 



V^" A y 



^£~Ag 



O m (oo)5(^) + — L= exp 



1 (V a l 



+ - 



exp 



'2 a 2 

u H 



0"£ 



where 



kfl < kfi < ^ > 



k% <kp < kn 



kfi <kjx < kji, 



kff <k^ < k M , 



(100) 



(101) 



n fi(x) — 1 _ nipixcr + //). 

The formulae presented in equations [5] and [15] are the same as the above formulae, expressed in a more 
compact notation using the discrete Heaviside function. 



5 Conclusion 

The asymptotic distribution for a general likelihood-based test statistic is derived, which covers all pre- 
viously described asymptotic distributions as special cases of the general formula presented here. 
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